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O ■ Abstract 

^ , The analysis of the channel capacity in the absence of prior channel knowledge (noncoherent 



channel) has gained increasing interest in recent years, but it is still unknown for the general case. 
In this paper we derive bounds on the capacity of the noncoherent, underspread complex Gaussian, 
orthogonal frequency division multiplexing (OFDM), wide sense stationary channel with uncorrected 



C/3 , scattering (WSSUS), under a peak power constraint or a constraint on the second and fourth moments 

of the transmitted signal. These bounds are characterized only by the system signal-to-noise ratio (SNR) 
and by a newly defined quantity termed effective coherence time. Analysis of the effective coherence 

> ; 

Q^ ' time reveals that it can be interpreted as the length of a block in the block fading model in which a 

\^ , system with the same SNR will achieve the same capacity as in the analyzed channel. Unlike commonly 

^^ ■ used coherence time definitions, it is shown that the effective coherence time depends on the SNR, and 

t^ ' IS a nonincreasing function of it. 

We show that for low SNR the capacity is proportional to the effective coherence time, while for 
higher SNR the coherent channel capacity can be achieved provided that the effective coherence time 
S^ ■ is large enough. 

M'. 

I. Introduction 

The analysis of communication systems is often performed under the assumption of perfect 
channel knowledge. In practical systems, however, the channel needs to be estimated from the 
communication signal itself, making this assumption unrealistic. 

^ School of Engineering, Bar-Ilan University, 52900 Ramat-Gan, Israel e-mail : bergeli@eng.biu.ac.il 

* Dipartimento di Elettronica - Politecnico di Torino e-mail : benedettoSpolito . it 

This work has been funded by PRIMO, a research project financed by MIUR, the ItaUan Ministry of Education and Research 

* Corresponding author. 

July 5, 2011 DRAFT 



SUBMITTED TO THE IEEE TRANSACTIONS ON INFORMATION THEORY 2 

In this paper we study the capacity of the noncoherent, underspread, complex Gaussian, orthog- 
onal frequency division multiplexing (OFDM), wide sense stationary channel with uncorrelated 
scattering (WSSUS), under a peak power constraint or a constraint on the second and fourth 
moments of the transmitted signal (commonly termed quadratic power constraint). We use the 
term noncoherent channel capacity to describe the capacity of the channel when neither the 
transmitter nor the receiver have any prior knowledge on the channel realization, but both have 
exact knowledge on the channel statistics. 

The OFDM model provides a simple representation of an underspread frequency selective 
channel [1]. In underspread channels the channel delay spread is significantly smaller than the 
channel coherence time [2]. Choosing the OFDM symbol length to be significantly larger than 
the channel delay spread, and yet significantly smaller than the channel coherence time, results 
in a useful approximation of the underspread channel which is both accurate and convenient for 
the analysis. (Detailed description on the connection between the OFDM model and the general 
WSSUS channel can be found in [3], [4].) 

This paper presents novel bounds on the capacity of the noncoherent underspread WSSUS 
channel that are characterized only by the system signal-to-noise ratio (SNR) and by a newly 
defined quantity termed ejfective coherence time. We show that the bounds are good for almost 
all of the SNR range as long as the effective coherence time is significantly larger than the 
OFDM symbol length. Analysis of the effective coherence time reveals that it characterizes the 
system ability to estimate the channel, and can be interpreted as the length of a block in the 
block fading model that achieves the same capacity as the analyzed channel at the same SNR. 
Surprisingly, and unlike standard coherence time definitions, the effective coherence time is 
actually a function of the system SNR. This dependence of the effective coherence time on SNR 
stems from the fact that at high SNR the system is more sensitive to changes in the channel, and 
hence will effectively see a shorter coherence time.^ The paper also includes a detailed study of 
the relation between the effective coherence time and the system SNR. 

The noncoherent underspread WSSUS channel capacity was well characterized for the high 

' A simple intuition for the decrease of coherence time with SNR is as follows. Consider an intuitive definition of coherence 
time for stationary channels: The coherence time is the maximal time interval for which the channel auto-correlation function 
does not decrease below a pre-specified threshold. The question is how to select the threshold value. Intuitively, a more sensitive 
system (higher SNR) will choose a higher threshold value, and hence will see a shorter coherence time. 
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SNR limit and for the low SNR limit (which is equivalent to the large bandwidth limit). For 
the low SNR limit, the capacity of the noncoherent underspread channel was shown to be 
proportional to p^Tc/2, where p^ is the SNR and T^ is the channel coherence time [3]-[7]. 
These results were obtained for different channel models with different constraints. Medard 
and Gallager [5] analyzed a stationary wideband channel with quadratic power constraint, and 
presented a novel definition of the channel coherence time which is calculated from the channel 
correlation function. In this case, the proportionality constant is the ratio between the transmitted 
signal fourth moment and the square of its second moment (also termed Kurtosis). The results 
in that work were derived for the wide bandwidth limit (in which the capacity depends on the 
number of resolvable channel taps, see also [8], [9]). In this paper, the channel bandwidth is 
finite, and we address the above result only in its low SNR limit interpretation. 

Sethuraman and Hajek [6] considered the block fading model with a peak power constraint. 
In that case the channel coherence time is the length of a block, and the proportionality constant 
was shown to be the ratio between the peak and average powers. They also considered a 
flat fading stationary channel model (i.e., without multipath spread) and achieved equivalent 
results, but using measures of the channel spectral distribution function (with no definition of 
channel coherence time). In a later work together with Wang and Lapidoth [7], they considered 
delay-separable frequency selective fading channels, and showed that the capacity is actually 
proportional to p^(Tc — l)/2; here the definition of the channel coherence time is the identical 
to the one used by Medard and Gallager [5] but the term coherence time is not used. (Note that 
for underspread channels the difference between T^ and T^ — 1 is negligible.) 

Durisi et al. used a similar channel model, but paid more attention to the discretization of the 
continuous time channel (and did not require the channel to be delay separable). As a result, 
they had worked with a time-frequency channel transfer function (which resembles in nature to 
the OFDM model used herein). They also did not use the term coherence time, but their results 
revealed the same behavior of the capacity in the low SNR limit, using a channel measure which 
is an extension of Medard and Gallager [5] coherence time to the time-frequency channel. (In 
their work they also analyzed the capacity with per frequency peak power constraint. This power 
constraint is not discussed herein, as it is quite different and leads to a significantly different 
capacity behavior.) 

For the high SNR limit, the main issue is whether the channel estimation error can decrease 
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to as the SNR grows to infinity. The channel fading is termed regular if it cannot be com- 
pletely predicted based on the full knowledge of its past [10]. An alternative definition uses the 
covariance matrix of the channel fading, and the channel is said to be regular if this matrix is 
full rank [11]. For regular channels, at high enough SNR, the capacity in the absence of channel 
knowledge is significantly lower than the capacity with perfect channel knowledge, and grows 
only double logarithmically with SNR [12], [13], [14]. On the other hand, for irregular channels 
the capacity can continue to grow logarithmically with SNR, (with possible degradation in the 
pre log constant) [13], [14]. 

Interestingly, no capacity expression directly depends on the common definition of coherence 
time (e.g., [15]), i.e., the inverse of the channel Doppler spread. For stationary channels, the only 
coherence time definition that was shown to be proportional to the capacity is the one given by 
Medard and Gallager [5], which only applies in the low SNR limit. The definition of effective 
coherence time in this paper coincides with Medard's and Gallager's coherence time definition 
in the low SNR limit, and extends it to higher SNR. Thus, the effective coherence time can 
characterize the channel capacity for all SNRs. 

The paper is organized as follows: the notation and channel model are presented in Sections 
II and III respectively. The main results, including the definition of the effective coherence 
time and the capacity bounds are given in Section IV. A discussion of the results including the 
properties of the effective coherence time, analysis of the bounding gap, numerical examples and 
comparison to known results are given in Section V. Note that the proof of each of the theorems 
(providing capacity bounds) is partly based on information theory and partly on estimation theory. 
For convenience, we collect those parts in different sections. Hence, Theorem n (n = 1,2,3) 
is proved by lemma n.a in Section VI (information theory part) and lemma n.h in Section VII 
(estimation theory part). Concluding remarks are given in Section VIII. 

II. Notation 

Throughout the paper we use Roman boldface lower case and upper case letters to denote 
random scalars and vectors, respectively (e.g., x, X). Roman italic lower case and upper case 
letters are used for deterministic scalars and vectors, respectively (e.g., x, X). Deterministic 
matrices are represented by sans-serif letters and random matrices by calligraphic letters (e.g., 
X, X ). Exceptions to this rule are scalar quantities that are commonly denoted by uppercase 
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letters. These include the channel parameters A^ and L, the coherence time symbol (Tc, Tc(-) 
and Tco), and the capacity symbol C. Another exception are operators that can result in a 
deterministic or random quantity according to their operand, i.e., the spectrum operator S. (see 
definition below) and the symbol set operator X(-) (see definition in Lemma l.b). 

The identity matrix is denoted by I, 0^ and 1^ denote the A^ x 1 vectors of all zeros and 
all ones, respectively. A (g) symbolizes the Kronecker product, a f stands for transposition and 
complex conjugation, ||A|| denotes the absolute value of the determinant of the matrix A, and 
diag(X) is the diagonal matrix with the elements of the vector X on its diagonal. The vector 
stacking is defined as: X^ = [Xj, . . . , Xj]^. The b, d element of a matrix A is denoted as (A)^^^ 
and the 6-th element of the vector Xk is denoted as Xk^b- 

For discrete Fourier transform (DFT) analysis, F denotes the DFT matrix with elements: 

F — -2JTrmn/N /1^ 

' m,n rr? ' ^^ 

y IS 

X = f^X denotes the DFT of a vector, and each element in this vector is termed a frequency 
bin. The spectrum of a vector is marked by: 

Sx = diag(X)diag(X)t. (2) 

The cross-covariance matrix of the vectors X and Y is denoted as: 



-X,Y 



E 



[X-E[X\)iY-E[Y])^' 



(3) 



and the auto-covariance matrix of the vector X is denoted as Cx = Cx,x- 

Finally, any matrix ordering in this paper is in the positive definite sense, i.e., A > B means 
that A — B is positive semi-definite. 

in. System model 

A. Channel model 

We use the OFDM model [1], [4], [16], [17], in which the DFT of the received signal is given 
by: 

Y, = v^diag(H,)X, + W, 

= yiVdiag(X,)H, + W, (4) 
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where X^^ = F^X^^ and H^, = F^H^ are the DFT of the input vector and channel impulse 
response at the A;-th symbol respectively, and both are vectors of length N. Each input vector 
constitutes an OFDM symbol. The frequency domain noise W^ = F^^W^ is a zero mean complex 
Gaussian vector with co variance matrix: C^ = I. The multiplication by V^ follows from the 
definition of the DFT matrix, (1). The channel itself is better characterized in the time domain 
where H^ = [h^ q, . . . , h^ ^_^, 0, . . . , 0]^, and L is termed the channel delay spread. 

This model is an appropriate representation of an underspread channel in which we assume 
Tc ^ A^ ^ L — 1,^ where Tc is the channel coherence time [15]. The assumption guarantees 
that the channel Doppler spread is small enough to be negligible, and that the channel impulse 
response can be considered constant throughout the N samples that constitute an OFDM symbol 
(for analysis of faster fading channels see [18]). The small channel delay spread, L, permits to 
neglect the time required for cyclic prefix {L — I samples after each OFDM symbol). 

In Section IV we define a new parameter, the ejfective coherence time, Tc{px), which identifies 
the effective number of received samples that can be used for channel estimation. The impact 
of the effective coherence time on the channel capacity will be treated in later sections. 

The mathematical analysis in the paper requires the following assumptions: 

Assumption I: The channel is underspread, i.e., the channel coherence time and the effective 
coherence time (measured in channel samples) are large enough so that: Tc,Tc{px) ^ A^- 

Assumption II: The channel has a proper complex Gaussian distribution ([19]). 

Assumption III: The transmitter has no knowledge on the channel realization. 

Assumption IV: The channel is ergodic, (wide sense) stationary, with uncorrelated scattering 
(WSSUS channel). 

Assumption V: The auto-correlation function of the different channel taps is identical up to a 
multiplication by a scalar. 

Assumption VI: The expectation of the channel impulse response is frequency flat. 

Assumption I is actually not needed for the analysis presented herein. We state it as our first 
assumption because it is the basis for the OFDM model. If the coherence time is not significantly 
larger than the symbol length, then our model, which implicitly assumes that the channel does 
not change during an OFDM symbol, will not be realistic. 

^We use the notation A*' ^ L — 1 in order to emphasize that in the fiat fading case (L — 1) it is sufficient to use A'^ = 1 
July 5, 2011 DRAFT 
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Assumption III stems from the fact that the system has no feedback. As a result, the trans- 
mitted messages are statistically independent from the channel. The requirement of uncorrelated 
scattering in Assumption IV is needed mostly for low SNR where we show that it is better 
to concentrate the transmission power in a single frequency bin. The uncorrelated scattering 
guarantees that if in one OFDM symbol we transmit in a certain frequency bin, the best channel 
estimate for the next OFDM symbol will be achieved at the same frequency bin (see derivation 
in Appendix B). 

Assumptions V and VI somewhat narrow the scope of the analysis. However, we keep them 
to simplify the mathematical derivation, and to reach closed-form expressions. Assumption V, 
although not necessarily realistic, is common to many simple channel models (e.g., assuming that 
all multipath components have identical doppler spread). Channels that satisfy this assumption 
are sometimes referred to as delay separable channels [7]. Assumption VI basically states that at 
most one channel tap has an expectation different from zero, and it includes the case of Rayleigh 
fading (the expectation is zero) and the case of line of sight (LOS) propagation (one channel 
tap has an expectation different from zero). 

Using Assumption II, the stacking of the channel vector has a proper complex Gaussian 
distribution, Hq ~ CA/'(-E[Hq], Cjjfe). Using Assumption IV the amplitudes of the different 
channel taps (in the time domain, h^Q, . . . , h^ ^_^) are statistically independent. Using also 
Assumption V the covariance matrices of the channel taps are identical up to a multiplication 
by a scalar. Considering k + 1 OFDM symbols, we denote by A^ the single-tap (A;+ 1) x (A;+ 1) 
Toeplitz covariance matrix, in which each element is given by 

(Afc)ij = Ch^_y,h^,,yCh„_„. (5) 

We also define the /c x 1 vector Dk in which the i-th element is given by 

dk,i = {Ak)i,k, i = 0,...,k-l. (6) 

This vector is the correlation between the channel tap value at the A;-th symbol and its value at 
all previous symbols. Considering all channel taps, the covariance matrix of the stacked channel 
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vector Hq is: 



CxTfc = Ai 



/(^A:)o,oCh,) (Afc)o,lCH(, 



-H: 



-H„ 



(Afc)o,fcCH„ \ 

(Afc)i,fcCH(, 



(7) 



\(AA;)fc,oCHo {^k)k,l^U^^ ■ ■ ■ {^k)k,k^HgJ 

and Ch is an A^ x A^ diagonal matrix, in which each diagonal element represents the power of 
one channel tap (note that due to Assumption IV, we can write Ch = Ch for any k). In the 
following we mostly consider the frequency domain channel, H^ = F^^H^. Its covariance matrix 
is given by Cg = F^Ch F. For normalization purposes we will use Tr[CH ] = 1 (and hence all 
elements on the diagonal of Cjj equal 1/A^). 

Another quantity which is useful in the characterization of the channel is the channel condi- 
tional covariance matrix given the past transmitted and received symbols: 



E 



:H, - E[U,]){U, - E[Hj2y\M-\^t' 



(8) 



This conditional covariance matrix is in general a random quantity since it depends on the 
random vectors Xg^^ (as the channel and the noise are jointly Gaussian, it does not depend on 
Yq^^). In some cases (e.g., constant amplitude modulations) the resulting conditional covariance 
matrix is deterministic (does depend on Xq^^), and hence will be denoted by S^. We will also 
use the limit as both time and SNR tend to infinity £^ = limfc^oo linip^^oo £k- Further details 
on the conditional channel distribution given the past transmitted and received symbols can be 
found in Section VII. 



B. Power constraints 

The input signal has constraints both on its average power and on its "peakiness". Defining 
the OFDM symbol power and power matrix as: 

Pfc = XlX„ n = diag([po,...,p,]), (9) 

respectively, we consider two types of constraints preventing the use of very high peak powers. 
The first, peak constraint, limits the peak power of each OFDM symbol: 



Pk < Nvx, V/c, 



(10) 
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with probability 1 (The peak power constraint does not involve statistical averaging, and therefore 
it is more relevant in practical systems). 

The second constraint, quadratic constraint [5], [8], limits the first and second moments of 
the OFDM symbol power: 



E[pJ<iVp., E[^i]<aNY,, VA;, 



(11) 



where « > 1 is a positive constant. This constraint limits the probability of very high peak pow- 
ers, but still permits a relatively simple analysis (for example by allowing Gaussian signaling). 
As the noise variance is normalized to 1, in the following we will mostly refer to px as the 
system SNR. 



IV. Main results 

In this section we introduce without proof upper and lower bounds on the capacity of the 
noncoherent channel described in the previous section. These bounds are characterized only by 
the system SNR and the effective coherence time: 

Definition 1: The effective coherence time is given by: 



t{p,) = 2NlimDl{Np, 



A 



fc-i 



DkDl 



I Dk + N. 



(12) 



As it will be shown, the effective coherence time characterizes the "effective number of channel 
samples usable for channel estimation". Although the previous statement is not precise now, it 
will be discussed further in Subsection V-A. 

Theorem 1: The capacity of a channel with a peak power constraint is upper bounded by: 

C < min {vb[^1\p,), UB,oh(p.)) , (13) 

and the capacity of a channel with a quadratic power constraint is upper bounded by: 



C<mm(vB[f{p,),\JB,,y,{p, 



where: 



UBcoh(Px) = E 



log 1 + A^p^|h, 



0,0 1 



(14) 



(15) 



M^!:hp.) 



low 



iVpx 



E 



^0,0 



2 1 

+ Px - -^ log I 1 



A^P. 



f(Te(pJ-Ar) 



(16) 
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>(qd) 



VBZ\p.) = Np, 



E 



ho,o 



2 



2 + p4ro(p.)-iV) 2 



Proof of Theorem 1: see Result 1, Lemma l.a and Lemma Lb.^ 

The term UBcoh(Px) in the bound is the capacity of the coherent channel with only average 
power constraint. We will show in Subsection V-B that it is useful in the medium-to-high SNR 
regime. For low SNR the coherent channel capacity is not achievable and tighter bounds are 
described by \]B\^J{px) and VBi^Jlpx). For low enough SNR, using log(l+a;) = x+^x'^+o{x'^) 
and l/(l + a;) = l — a; + o{x), these bounds can be approximated as: 

M!1\P^) ^ Np, \e [ho,o] f + Y^cip.), (18) 

pL 



MfiPcc) ^ Np, E h. + «^t(Px). (19) 



^o,oj . - 2 

If the channel has a constant term (IE'IIiqq]!) different from zero, this is the dominant term. 
Otherwise, the bounds are proportional to plTc{px), but the quadratic power constraint allows a 
capacity which is a times larger. 

For the lower bounds, we choose specific input distributions (modulations). For low SNR it 
is important to use a modulation that maximizes the estimation accuracy subject to the power 
constraints. Such a maximization is achieved by constant amplitude modulation, and therefore 
we derive a lower bound using Quadrature Phase Shift Keying (QPSK). As it was noted in 
previous works (e.g., [6], [9]), in the low SNR regime it is better to use only part of the time 
and/or part of the frequency band. Using only part of the available degrees of freedom reduces 
the number of parameters that need to be estimated and hence results in better estimation. In the 
following bounds we allow the use of only r out of the N frequency bins, and allow transmitting 
in 1/(3 of the time. In the time domain the signal is transmitted in long blocks (significantly 
larger than the effective coherence time), and the silent periods between blocks results in a duty 
cycle of 1/(3. The resulting bound is: 

Theorem 2: The capacity of a channel with a peak power constraint is lower bounded by: 

C > max LBqpsk {Px ,r,l) (20) 

r<N 

^The proof of each theorem is divided into two lemmas. In the first lemma we use information-theoretical arguments to 
prove the theorem assuming a knowledge of the conditional channel distribution (defined by the conditional channel covariance 
matrix), while in the second lemma we bound the conditional channel covariance matrix. 
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and the capacity of a channel with a quadratic power constraint is lower bounded by: 



C > max max LBqpsk(Px,^,/3) 

r<N l</3<a 



(21) 



where 



LB 



Af/3 



E 



QPSK 



{Px,r,l3) = < 



( 



C, 



QPSK 



^^[i^o.o]+,FTTii 



^VHcl -Pa: I 



\^ 



( 



\ 



a 



QPSK 






V 



"P- l + ^i^(tc(/3p,)-iV) 









r <N 



r = N 



a 



QPSK 



(P) 



log4- W- /" e-^'/2 log (^x ^ g 



-2(p+^y) 



)dy 



(22) 
(23) 



is the achievable mutual information for QPSK transmission over AWGN channel with SNR 
equal to p ([20]) and u ~ CA/'(0, 1) is a proper complex Gaussian random variable. 
Proof of Theorem 2: see Lemma 2. a and Lemma 2.b. 
Note that if the channel has zero mean (i^flLQ] = 0) then the bound in (22) simplifies to: 

-iV)|u|2 



LB 



QPSK 



(Px,r,/3) 



Af/3 



E 



-,E 



Cqpsk 
C'qpsk 



«2p| 
2 



l+e/3p.+ 



HVx 



-P.)-N) 



4^#(fc(fe)-JV)|up 

/3 Px 



r < N 

r = N 



This bound is especially interesting in the low SNR limit. In that case, the best bound uses 
only a single frequency bin, and the minimum allowed duty cycle. Using Cgpsxl^;) = x + o{x) 
and 1/(1 + x) = 1 + o(l), the low SNR approximation of the above bound for the peak power 
constraint is: 



LB 



QPSK 



{p,, 1, 1) ^ Np, 



E 



^0,0 



+ f(Tc(p.) 



N 



and the low SNR approximation for the quadratic power constraint is: 



LB 



QPSK 



(j9^,l,a) ^ Np^ 



E 



^0,0 



+ a^{T,{ap,)-N]. 



(24) 



(25) 



Comparing with (18) and (19) we see that the bounds are tight for the peak power constraint 
as long as Tc {px) ^ A^,"* while for the quadratic power constraint the bound tightness depends 

"'Much of the work on OFDM is concerned with the peak-to-average power ratio (PAPR), while in this work we only consider 
the entire symbol energy. Yet, we note that in the low SNR limit the lower bound uses a single frequency bin, and hence the 
PAPR is 1. 
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on the properties of the effective coherence time. When the limit Tc(, = liiap^^QTc{px) exists, 
the bounds are tight up to a factor of (Tc^ — N)/Tc^y We will further discuss properties of the 
effective coherence time in Subsection V-A. 

For higher SNR we expect the coherent channel bound to be achievable. In the coherent 
channel the capacity is achieved using a wideband Gaussian modulation, and we expect it to 
be close to optimal also for the noncoherent channel, in the appropriate SNR regime. Several 
simulations performed using Gaussian modulation showed convergence to the upper bound, and 
in Subsection V-A we even discuss a simulation-based approximation that holds for large enough 
effective coherence times and can be used to approximately lower bound the channel capacity 
using Gaussian modulation. 

However, so far we have not been able to prove a useful lower bound using Gaussian 
modulation. Instead, we present here a bound which is based on truncated complex Gaussian 
modulation. This modulation is defined in Section VI. In essence, it uses a proper complex Gaus- 
sian distribution with the condition of a minimal and maximal power for the signal transmitted 
in each active bin. The presented bound will show that indeed the coherent channel capacity is 
achievable for high SNR and large enough effective coherence times. 

For this bound we also require a zero mean channel (i^fliQ^J = 0). An alternative (simpler) 
bound that does not require this assumption is presented in Appendix D. This alternative bound 
is in general less tight, and therefore we prefer to assume a zero mean channel and focus on the 
following bound: 

Theorem 3: The capacity of a channel with zero mean (E[ho,„] = 0) and a quadratic power 
constraint is lower bounded by: 

C > sup max max LB^'^Jj^ (p^, r, /3, rj) - ——r] (26) 

,7>0 '•<^ 1<0<.2I(1±H)1 ^VP 

and the capacity of a channel with zero mean (_E[hQ^] = 0) and a peak power constraint is 
lower bounded by: 

C > sup supmaxLBr^Q (pa;, r, 77, ^) + — log(e~'' — e~^) (27) 

r;>0 ^>7j r<^ N 
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where 



r 771 



LB*r^G(P^'^'/5'^) = { 



i^ 



log 



ff}p.\u\^ 



l+-^l3px 



V 



flpx 1 



Tc 



f>Px 



log 1 



1+- 






gpa: 1 



"qcH 



^^^IW^rrMiTTM 



PPx 



^qcH 



/J 



r < A^ 



r = N 



(28) 



>(pk) 



TV 



E 



LBVP^^(p.,r,77,0=< 



\ 



log 1 + 






Tie ^-je ' \ |u|2 






E 



/ 



■'pkC'J.O^l^ '^l^ri.pkC.?.?) 



1/ 



r < A^ 



log 



J|L(1. 



T)e '^-ge ^ 



|u| = 



1+- 



H^( 



?7e ^— ^e ^ 



^^^ij^?;?y2rr'=u^i^i^^ 



"pk^ 



/J 



(29) 



r = N 



'^qd(^) 



;i + r^) log ( 1 + - 
?7 






(30) 
(31) 



1 - e''-« 
and u ~ CJ\f{0, 1) is a proper complex Gaussian random variable. 

Proof of Theorem 3: see Lemma 3. a and Lemma 3.b. 

The bound for the quadratic power constraint is tight for high SNR and high effective coherence 
time. Together with Theorem 2 it yields a pair of bounds encompassing the range of SNR for 
which the effective coherence time is large enough. Considering in particular the case of (3 = 1 
and r = A^, we can see two penalty terms. A capacity penalty term of i] (the second term on 
the right hand side of (26)), and an estimation penalty term in the denominator of (28). To show 

the bound tightness we can choose for example 

1 



V 



(,Vfc{Px) _ I 

Note that Tdpx) > N >1 and hence we have r] <1 and t'qd(^) < 2 log ( 1 + - 
Therefore, the estimation penalty term in (28) satisfies: 



(32) 



2\Upx). 



I^Px 



^qdin) 2-L 



(^c (^aw) - ") " ^A^-cfc) ^ 't,^)' 
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and becomes negligible for large enough Tc{px)- On the other hand, the capacity penalty term 
is proportional to r], (32), and is also negligible for large Tc{px)- Thus, for high SNR and large 
enough effective coherence time, the bound can be approximated as: 

sup LBtg(Px, N, 1,7]) ^E [log (1 + px \uf)] . (33) 

0<r;<l 

Since in this case hg o ~ CJ\f{0, j^), this approximation converges to the coherent channel upper 
bound UBcoh(Px) (15). 

For the peak power constraint the situation is more complicated due to the power decrease that 
is required to allow for the truncated Gaussian modulation to satisfy the peak power constraint. 
This power penalty term is given by | ( 1 + ^l-lzl-n ) (in the nominator of (29)). For high 
enough effective coherence time (using for example r] as in (32)) the estimation will be good 
enough so that i] will be negligible and also the denominator of (29) will be very close to 1 . In 
such case, at high SNR and r = N the capacity loss compared to the coherent channel capacity 
converges to: 



i°g(K'-T^))+'°«(^-'"') 



which is maximized by .^ = 1.79 at a loss of —1.21 nats. Thus, the bound will be tight only for 
very high SNR, where such capacity loss is negligible, if the effective coherence time Tc{px) is 
still large enough. 

V. Discussion 

A. Effective coherence time 

The three bounds in the previous section show that the effective coherence time can very well 
characterize the noncoherent channel capacity. Yet, its definition: 

f,{px) = 2N lim dI (Npx \Ak-i - DkOl 

is not intuitive and needs a discussion on its meaning and properties. In the following we 
present some interesting properties of the effective coherence time, followed by a discussion on 
the reasoning and motivation for each property. In particular, we will also explain why to use 
the term effective coherence time for this quantity. 
Properties of the effective coherence time 
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1) Using wideband constant amplitude modulation (i.e., all OFDM frequency bins are active 
and all have identical amplitude, r = N), the conditional channel co variance matrix, given 
all past transmitted and received symbols, is a diagonal matrix. The diagonal element that 
corresponds to any channel tap such that (Ch )/,/ > is given by: 

^lim (£,)-/ = (CnXl + y (t {p. ■ (ChJ,0 - n) . (34) 

2) Using wideband constant amplitude modulation, each diagonal element of the conditional 
channel covariance matrix in the frequency domain, given all past transmitted and received 
symbols, is upper bounded by: 

lim {£k)m,m < -T7 TZ ^- (35) 

'^^ ^l + t(Te(p.)-iVJ 

Equality in (35) is achieved if the channel energy is concentrated into a single tap (i.e., 
if L = 1). 

3) Using narrowband (i.e., only a single frequency bin is active, using all of the OFDM 
symbol power, r = 1) constant amplitude modulation, the relevant diagonal element of the 
conditional channel covariance matrix in the frequency domain, given all past transmitted 
and received symbols, is given by: 

1 



1 + f (Tc (p.) - N 



(36) 



4) At the low SNR limit, define: 



t, = lim tip,) = 2N Mm dIDu + N. (37) 

If this limit exists, it converges to the definition of coherence time used by Medard and 
Gallager [5]. 
5) For any A > 1 the effective coherence time satisfies: 

Tc (p.) /A < fe (Ap,) < fe [p,) . (38) 



6) If the prediction error is not zero, i.e.: 

eZ = lim lim £k = lim Cjj [1 - DlA,^,Dk] > (39) 

fc->OOPa;->00 fe— >00 " 
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then: 

lim t{p^) = N. (40) 



Properties 1-3 are shown in the proof of Lemma 2.b in Section VII. They reflect the relation 
between the effective coherence time and the conditional channel distribution, which is the basis 
for the capacity bounds. They can also be used to argue the relationship between the coherence 
time and the effective coherence time. To particularize the results in a case where we have a 
direct and intuitive relation between capacity and coherence time, we consider a block fading 
channel [21]. In this model, the time axis is divided into blocks of length Tc/N, and the channel 
realization is fixed and independent within each block. This channel is nonstationary, and exhibits 
different statistics for each symbol (as an example, the first symbol of each block has correlation 
only with future symbols, while the last symbol of each block has correlation only with past 
symbols). As a reference, we assume that the k-th symbol is the middle symbol of a block and 
that the block length is an odd multiple of the symbol length A^. It is easy to show^ that the 
conditional channel covariance matrix for the middle symbol of a block satisfies Properties 1-3 
if we substitute T^Px) with Tc, the block length. This similarity motivates the term ejfective 
coherence time as an extension of the coherence time definition to more general channel models. 

Our simulations have shown that Properties 1-3 can also well approximate the conditional 
channel covariance matrix given all past transmitted and received symbols when using Gaussian 
distributed input signals, as long as the effective coherence time is large enough. The Gaussian 
modulation is of interest as it achieves the coherent channel capacity. So far, we have not been 
able to provide a formal proof of this, and this is the reason why we used truncated Gaussian 
modulation instead of Gaussian modulation for the capacity lower bound in Theorem 3. We 
demonstrate the accuracy of this approximation in the simulation results presented in Section 
V-C. 

Property 4 is easily verified from (12). As long as Tc,j exists, the effective coherence time 
can be seen as an extension of the coherence time defined in [5] to higher SNR. Note that 

^For example by using Dk = Id and Afc_i = 1^1^ in (97) for the wideband case (with /? = 1), and in (87) for the narrowband 
case, with d = 0.5{Tc/N - 1). 
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the effective coherence time gives a good characterization of the capacity even for channels for 
which Tc„ does not exist. 

The major difference between any channel coherence time definition and the effective co- 
herence time is that the latter is a function of the system SNR. This may seem a surprising 
characteristic of a coherence time, since intuitively one would expect the coherence time to 
characterize the channel, regardless of the system parameters. This dependence of the effective 
coherence time on SNR comes from the fact that at high SNR the system is more sensitive to 
changes in the channel, and hence will effectively see a shorter coherence time. Alternatively, one 
can say that a system with higher SNR requires better channel estimations. Thus, such a system 
will be able to use only channel measurements that present higher correlation with the present 
channel state, so that an increase in the SNR will reduce the number of useful measurements 
and consequently reduce the effective coherence time. Note that some engineers actually regard 
the SNR as a part of the channel and not as a system parameter (in the sense that in many cases 
the transmission power, just like the channel, is a limitation given to the system designer and 
not part of the system design). 

Property 5 can be seen from (12) by considering the eigenvalue decomposition of the matrix 
A^._i — DkD\. A more detailed description can be found in [22]. As reflected by Property 5, the 
effective coherence time is a nonincreasing function of the SNR, and for most channel models 
it decreases as the SNR increases. 

Property 6 follows from Property 2 and deals with the high SNR extreme behavior of the 
effective coherence time. The prediction error is the conditional covariance of the channel 
given all past channel realizations (i.e., if the prediction error is zero then the channel is fully 
predictable). In regular channel models <S^ is larger than zero [12], and represents the inherent 
uncertainty in the channel estimation given the full knowledge of its past. Property 6 shows that 
for regular channels the effective coherence time converges to A^ as the SNR grows to infinity 
(as we assume that the channel does not change within an OFDM symbol, the model cannot 
show a coherence time smaller than A^). 

As we limit the present analysis to large effective coherence times, for such regular channels 
the presented bounds will not be tight for very high SNR. The looseness of the bounds for 
very high SNR is in agreement with the results of Lapidoth and Moser [12], which showed that 
the coherent channel capacity is not achievable in regular channels in the high SNR limit (the 
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capacity grows only doubly logarithmically with SNR). 

Although not proved herein, the effective coherence time can significantly decrease even for 
irregular channels. As an example consider a Clarke's fading channel [23] with a doppler spread 
of 300 Hz and an OFDM symbol length of 66.7/xS (which corresponds to an LTE mobile device 
operating at the 2GHz band and moving at a speed of 160KMH). The effective coherence time 
of this channel at 6dB SNR is 50 symbols, and the underspread assumption is well verified. 
However, at a very high SNR of 116dB the effective coherence time drops to only 10 symbols 
(at which point one might start questioning the underspread assumption). 

B. Bound gap at finite SNR 

We have discussed the bound tightness at the high and low SNR limits. In this subsection 
we discuss the gap between the bounds in finite SNR. We will show that in most cases, if the 
coherence time is large enough, this gap is not significant, and hence the bounds describe the 
channel capacity very well. For simplicity, the discussion in this subsection is limited to the case 
of a channel with zero mean (i^pp] = 0). 

We start with the peak power constraint for p^ < ^ ,^ , . Using log(l + x) > x — O.bx"^ the 
upper bound (16) is upper bounded by: 

UBr VP:.) < J' -^^^ ^ —■ (41) 

1 + f (Te(p.) -N) 1 + f (Te(pJ - N) 

For the lower bound we substitute in (23) the inequality^ log(l + e~^) < log 2 — f + % which 
results in Cqpsk(p) > P ^ P^ ■ Substituting the inequality and r = /3 = 1 in the lower bound 
(22), we get: 

2 



^ (fe (p.) - N) N'-f ft (p.) - N 
LBqpsk^, 1, 1) > T^ ^ -^ -^ ^. (42) 

1 + f (Te (p.) - iV j + Np, (l + l| (fe (p,) - iV) + Np, 



These upper and lower bounds are almost identical for very low SNR and high effective coherence 
time, and start diverging as the SNR increases. For the SNR range of interest, the largest 
gap appears at the highest SNR: p^ = - ^ Defining the factor k, = "rf"^ ■, (which is 

J c {Px ) -/ c [Px ) 

''Using: log(l + e"^) = log2 — | + log cosh (|), and cosh(a;) < ea^ . 
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approximately 1 for large enough effective coherence time) we have Np^ 



Px 1 + K^ 



2 -2k: 



LB 



QPSK 




(43) 



(44) 



For example, if p^ = -r^ 
(43) and (44) is only 2.5%. 



is satisfied for fc{px) = lOOA^ then k = 0.99 and the gap between 



Next, for - f < Px "^ \ (and still for the peak power constraint), we turn to a numerically 



verified bound: For u ~ CA/'(0, 1) and p < 0.5 one can verify that: 

E [CQPSK(p|un] > 0.97E [log(l + p|u| 

Using this inequality in (22) and substituting r = A^, /3 = 1 we have: 

f ff,(p.)-iv)|up 



(45) 



LBQPSK(Px,Ar,l) = E 



a 



QPSK 



E 



a 



> 0.97E 




QPSK 



(46) 



Comparing this bound with the coherent channel upper bound, (15), and recalling that hgg ~ 
CJ\f{0, -^) one can evaluate the gap between the bounds. Again this gap is largest at the high 
SNR limit, p^ = 0.5. Considering the example of Tc(0.5) = lOOA^ and A^ = 5L, the gap between 
the bounds is at most 4%. 

Thus, for the peak power constraint case the bounds are good if the effective coherence time 
is large enough for SNRs of up to p^ < 0.5. As stated in Section IV, for higher SNRs the bounds 
are less tight for the peak power case (due to the power decrease that is required to allow for the 
truncated Gaussian modulation to satisfy the peak power constraint). The bounds can be tight 
again only at much higher SNR where the power penalty term (which results in an asymptotic 
difference of 1.2 nats) becomes negligible (if the effective coherence time is still large enough). 
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In the case of the quadratic power constraint we already stated the bounds' tightness when 
the effective coherence time is large enough for the high SNR limit and for the low SNR limit 
if the limit Tcj, exists. We next quantify the bounding gap for finite SNR. 

In the low SNR regime the bounds can be quite loose, depending on the behavior of the 
effective coherence time as a function of the SNR. Typically, the bounds are less tight at the 
point where the two upper bounds intersect. If Tc{px) ^ A^ then the two upper bounds ((15) 
and (17)) intersect very close to p^ = -, — -^, — r. Using the same derivation as in (42) and 

{CX- i-j lc\Px) 

substituting r = 1 and /3 = a, we can lower bound the QPSK lower bound by: 



LBqpsk(Px,1,«) > 



a# It (ap.) - N) Na^f (t («P.) - N 



1 + Nap, + ^ (fc (ap,) -N^ (l + Nap, + ^ (f^ (ap,) - N 



2- 



Redefining k = "Iff , and defining v = -g-r ^'^^\ we have v = ap,Tc{ctPx)/'^, Nap, 
v{2 - 2k) and 

2 A ^ vnp, _ 2v'^k^{2-2k)p, 

{a - l)t{p,) ' ' ""; - 1 + i;(2 - k) (1 + v{2 - K)f 

while the upper bounds intersect very close to: 



UbE^ (p.) ^ UBeoh(p.) ^ p, (47) 

Nothing that for high enough effective coherence time /t ^ 1, the bounding gap is mostly 
determined by the term v. Using Property 5 from Subsection V-A, v is lower bounded by t; > ^^ 
and hence the bounds can differ by approximately a factor of a. However, typically, the effective 
coherence time will not change that fast. If the effective coherence time is approximately constant 
at the relevant SNRs {Tc{ctPx) ~ tiPx)) the ratio between the bounds will be approximately 
^^^ (i.e., between 1.5 for a = 2 and 2 for high a). 

For higher SNR, the bounding gap becomes smaller. Inspecting the lower bound (26), we have 
two penalty terms compared to the upper bound (15). The first is a capacity penalty term (the 
second term on the right hand side), and the second is an SNR penalty term (at the denominator 
of (28) inside the log). As shown above, the bounds are tight for high enough SNR and high 
enough effective coherence time. Taking as a reference SNR oi p, = 1 (UBcoh(Px) > 0.6 nats), 
setting r = N, (5 = \, and r] = 0.005 will result in a capacity penalty term of 0.005 nats. If the 
effective coherence time satisfies T^Px) > 330L + A^ then the SNR penalty term will be less 
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than 0.3 dB, and the gap between the bounds will be less than 5%. For higher SNRs (satisfying 
the same condition on the effective coherence time) or for higher effective coherence times, the 
gap will be even smaller. Note however that for high enough SNRs the effective coherence time 
will typically decrease to a level that will not allow coherent communication, and the bounds 
will diverge.' 

Numerical examples showing the bounds gap are shown in the next subsection. 

C. Numerical example 

In order to visually demonstrate the bounds derived in the previous sections, we evaluate 
them numerically for the auto-regressive channel model of the first order (ARl). This channel 
model is defined by a single parameter, 7, the channel forgetting factor, and is characterized 
by Ch ,H = 7'*~"''Ch • The effective coherence time for this channel can be calculated using 
(12), and its low SNR limit is Tc^ = N(l + 7^)/(l — 7^). Throughout this section we assume 
E[HJ = 0, equal power taps (i.e., (ChJm = 1/^ for < / < L) and iV = 30, L = 5. 

Figure 1 shows the effective coherence time of the ARl channel for different values of the 
channel forgetting factor. In all cases the effective coherence time reaches its limit, Tq,, at low 
enough SNR (for Tc^ > 50, 000 this convergence is not seen in the figure as it happens in lower 
SNR). For higher SNR the effective coherence time is a decreasing function of the SNR, until 
it reaches Tc{px) = N which is the lowest value measurable in our model. 

Figure 2 shows the bounds on the capacity of the ARl channel with quadratic power constraint 
when the quadratic constraint constant in (11) is set to a = 10. The figure shows the capacity 
bounds when the channel forgetting factor is 7 = 0.9672 {T^o = 900). The channel capacity is 
upper bounded by the low SNR upper bound, UBj^^ (17), which is effective for low SNRs, 
and by the coherent channel upper bound, UBcoh (51), which is effective in higher SNRs. In 
order to demonstrate the role of the different lower bounds we draw 3 lower bound curves. 
Two of the bounds are based on the QPSK bound (22). For low SNRs the tightest bound 
uses narrowband QPSK signaling: LBqpgj^ ^^^ = maxi<^<a LBqpskIPo;, 1, /3)- For medium SNRs 
the tightest bound uses wideband QPSK signaling: LBqpgj^^^ = maxi<^<a LBqpsk(Px, ^, /3)- 

'The two lower bounds in this work assume a standard coherent communication scheme, i.e., the receiver estimates the channel 
based on past symbols, and then uses this estimate to detect the next symbol. 
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For high SNRs the tightest bound uses the truncated Gaussian signaling, (26): LB^q^^^ = 
supo<^<i LBtg(Px) ^) 1)^)- Note that the upper bounds intersect at p^ = —38.5 dB. At this 
point the ratio between the upper bounds and lower bounds is 0.5. For higher and lower SNR 
the bounds get tighter, up to the point where, for high enough SNR, the effective coherence time 
decreases too much and the bounds diverge. 

The figure also demonstrates the achievable rates using Gaussian signaling, and the accuracy 
of the approximation suggested in Section V-A. Using Montecarlo simulations, we evaluated 
the bound for proper complex Gaussian input distribution (using (79) with i] = and the 
conditional channel co variance matrix from (84)). The resulting rate is depicted in x-marks in 
Figure 2 (such an evaluation is of course feasible only for short coherence times). The figure also 
depicts (in a dashed line) the resulting rate when the conditional channel covariance matrix is 
approximated according to Properties 1-3 in Subsection V-A (which where derived for the case 
of constant amplitude modulation, and are used here for the approximation of the capacity in 
the case of Gaussian modulation). As it can be seen, at least for the plotted case, the suggested 
approximation is very accurate and the approximation error is negligible. The large number of 
simulations performed have supported this claim, and shown that the approximation accuracy is 
even better for channels with longer coherence times. Based on these observations we suggest 
that Gaussian signaling may lead to an even tighter lower bound. 

Figure 3 depicts the combined capacity bounds for various values of the channel forgetting 
factor when the quadratic constraint constant is a = 2. The upper bound (depicted by solid lines) 
is the bound given by Theorem 1 . The lower bound (depicted by dashed lines) is the maximum 
of the bounds given by Theorems 2 and 3. The figure depicts the bounds for forgetting factors of 
7 = 0.9851, 0.997, 0.9994 (which correspond to fc„ = 2, 000, 10, 000, and 50, 000 respectively). 

As it can be seen, the bounds are good in most of the range. The bounds are least tight at 
SNR values in which the two upper bounds intersect. In these SNR the ratio between the upper 
and lower bounds is 1.89, 1.64 and 1.59 for fc^ = 2, 000, 10, 000 and 50, 000 respectively, very 
close to the ratio predicted in Section V-B for the case of slowly changing Tc{px)- For higher 
and lower SNR the bounds are much tighter. In particular, for high SNR all lower bounds are 
close to the coherent channel upper bound, but the lower bound is tighter for channels with 
higher effective coherence time. 

The x-marks in the top right end of the lower bounds (for Teg = 2, 000 and Tc^ = 10, 000) mark 
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the point in which the effective coherence time dropped below Tc{px) = 2N = 60. For higher 
SNR the lower bounds will not be tight. More importantly, for higher SNR our model will not 
represent well the physical channel, because the assumption that the channel impulse response 
does not change during one OFDM symbol can no longer be justified. For longer coherence 
times, this SNR threshold is higher and hence not shown in the figure. For any SNR below the 
x-marks, the bounds derived in the previous sections are good, and describe the channel capacity 
with high accuracy. 

D. Comparison to known results 

1) Closest results: The bounds which are most similar to the results presented here were 
derived by Sethuraman et al. [7] and by Durisi et al. [3]. Sethuraman et al. present an upper 
bound for the frequency-flat fading case (N = 1) with a peak power constraint. In this case 
the upper bound in [7] is tighter than the bounds in (18), but the difference is negligible for 
underspread channels (Tc,, ^ N). The difference in the upper bound results from the relaxation 
log(l + x) < a; in (55). This relaxation can be easily avoided, but the resulting bounds are much 
more complicated for A^ > 1, (in particular in the context of underspread channels and effective 
coherence time) while the difference between the bounds is negligible. Durisi et al. presented 
upper and lower bounds for frequency selective fading channels, but only for the case of a peak 
power constraint both in time and in frequency. Both works also presented results on low SNR 
capacity asymptote that will be discussed in the next subsection. 

2) Low SNR limit when Tc,j exists: In the low SNR limit, if the limit of the effective coherence 
time. Ten 5 exists as the SNR goes to zero (37), the channel capacity is known and matches the 
results presented above both with quadratic power constraint and peak power constraint. For the 
quadratic power constraint [5], the capacity at the low SNR limit is aTc^^, which is exactly 
equal to the low SNR limit of the upper bound (19). Note that the low SNR limit of the lower 
bound (25) is lower by a factor of (Tco — ^)/^co than the upper bound, which is negligible for 
^co ^ N- (The main reason for this difference between the upper and lower bounds is the lower 
bounding transmission scheme which assumes that each OFDM symbol {N samples) is decoded 
using channel estimation based only on past symbols.) 

For the peak power constraint flat fading case (A^ =1), the capacity is p^(Tc„ — l)/2 [7], 
which matches the results presented here (again up to a negligible factor of (Tc„ — A^)/Tco as 
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discussed above). Note that for the peak power constraint we did not analyze the case where 
the average power constraint is lower than the peak power constraint. 

For A^ > 1 the same work shows that the low SNR capacity asymptote for the delay separable 
underspread stationary channel is also given by p^(Tcy — l)/2. Under the assumption T^q ^ N 
this channel is approximately equal to the OFDM channel considered here, and the low SNR 
asymptotes also match. This result also matches the low SNR asymptotes presented in [3] for 
the case of peak power constraint only in time. It is interesting to note that for this case the peak 
power constraint used in [7], [3] is stricter than the peak power constraint used here. The power 
constraint in this work limits the energy of a single OFDM symbol. Translating to the time 
domain the constraint is applied to the average energy of groups of N samples. On the other 
hand the constraint in [7] applies for each time domain sample. Interestingly, the bounds are 
almost identical, even though [7] clearly shows that relaxing the peak power constraint results 
in a higher capacity for the same average transmission power. Comparing these two works one 
can conclude that the relaxation of the peak power constraint is effective only if it is applicable 
for time periods which are at least on the order of the channel coherence time. Allowing signal 
"peakiness" which must be averaged over periods {N) which are significantly shorter than the 
channel coherence time (Tc„) is not sufficient to increase the channel capacity. 

Since the effective coherence time can be interpreted as the block length of a block fading 
model that achieves the same capacity, our results naturally match results that were derived for 
the block fading model (e.g., [6], [8]). 

3) Analysis for a given channel estimation error: Several works (e.g. [18], [24]) consider 
the effect of channel estimation error with a given variance on the channel capacity. These 
works have been the basis for the lower bounds presented here, but they miss the effect of the 
transmitted signal on the ability to estimate the channel ([18] considers also an estimation from 
an out-of-band pilot signal). In this sense, one can say that the QPSK lower bound presented 
above (Theorem 2) is the most straightforward part of the work. It combines an extension of 
the bounds in [18], [24] to the OFDM model, the channel estimation scheme of [25], [26] 
which allows to estimate the channel using all (relevant) past transmitted and received symbols, 
and results on channel estimation errors for constat amplitude modulations. Perhaps the most 
important contribution of Theorem 2 is the derivation of the bound in a way that shows the role 
of the effective coherence time. Note that the truncated Gaussian lower bound (Theorem 3) is 
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more complicated as the modulation is not constant amplitude, and the upper bounds are more 
complicated as they cannot assume any estimation scheme. 

4) High SNR limit: As detailed above, for regular channels our analysis holds only up to 
some finite SNR. The results show that the coherent channel upper bound is achievable as long 
as the effective coherence time is large enough. This is in agreement with known results [12], 
[13], [14] showing that in the very high SNR limit the coherent channel upper bound is not 
achievable. Our analysis cannot predict the actual double logarithmic behavior of the capacity, 
as it happens at too high SNR where the assumption on the value of the effective coherence 
time is not valid. 

For irregular channels, although not proven, the effective coherence time also typically de- 
creases as SNR increases (although at lower rate). Hence, our analysis is limited in SNR even 
for irregular channels. 

The modeling problem at high SNR is also discussed by Durisi et al. [27], where they focus 
on the characterization of the range of SNRs in which the channel discretization is reliable. 
Note that their approach is quite different from the one taken above. In [27], the analysis starts 
from a continuous time channel, and tries to analyze all modeling errors in the discretization 
process. In the analysis above we study the behavior of a discrete time OFDM model without 
assuming any discretization imperfections. Yet, we reach the same conclusion, even the discrete 
time OFDM model reveals the high SNR model limit. 

5) Dependence of coherence time on SNR: Few works analyze the dependence between the 
SNR and the coherence time. In these works the coherence time characterizes the channel (and 
does not change for each channel). For example Zheng et al. [28] consider a block-fading channel, 
where the channel remains constant for a block of / symbols, before changing to an independent 
realizations. Chen and Veeravalli [29] consider a block stationary channel where the fading is 
constant across a block of length T and changes in a stationary manner between blocks. 

In both cases the analysis considers a set of channels, and tries to characterize relations between 
the channel coherence time and the system SNR that will result in a certain capacity behavior. 
Chen and Veeravalli show ([29] equation (21)) that two systems with identical block correlation 
have the same capacity if the product of their peak powers and block lengths (T ■ SNR) is 
identical. 

Zheng et al. analyze the limit capacity of a set of channels with increasing coherence time / 
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and decreasing SNRs. They show that different relations between coherence time and SNR of 
the form / = SNR" achieve different capacity behavior between the coherent and noncoherent 
scheme. 

Our analysis is completely different. We consider a (single) specific channel, and show that 
its effective coherence time is a function of the system SNR. 

VI. Channel capacity and bounds 

In this section we analyze the channel capacity and give the information-theoretical part of 
the theorem proofs. 

In general, even if the input symbols are independent, the output symbols are dependent due 
to the channel memory. Therefore, we need to evaluate the capacity over the entire transmission, 
defined as: 

1 

Pr(X;f) 

Note that the ergodic requirement in Assumption IV guarantees the achievability of the capacity 
(see for example [12], [6]). 

Most of the bounds derived in this section depend on the conditional distribution of the channel 
given the past transmitted and received symbols. This distribution will be analyzed in detail in 
the next section. For this section, it is enough to state that given the past symbols, the channel 
has a complex Gaussian distribution Hj^.|Xq^^, Yq^^ ~ CM{iik, Sk)- Lemma 3. a also uses the 
distribution of the channel mean: 



C= lim sup /(Xg;Yo"). (48) 



|X^i~CArfi?[H,],CH, -Sk) (49) 



A. Coherent channel upper bound 

The first upper bound we use is the well known channel capacity when the channel is known 
to the receiver (coherent channel) with only average power constraint: 






Pr(X;f) 

1 " 

„^ooiV(n + l)^p,(^j 
= i- sup /(Xo;Yo,Ho). (50) 

^^ Pr(Xo) 
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Using Assumptions IV and VI, this bound is maximized when the input signal has Gaussian 
distribution with -E'[X^] = 0, and Cx = Px ■ ^ [30]. The maximal mutual information is given 

k 

by: 

Result 1: The channel capacity is upper bounded by 

C < UBeoh(p.) = ^H,, [log (l + iVp.lhoop)] . (51) 

The bound in (51) is the capacity of the channel with a constraint only on the average power. 
As both types of power constraints analyzed here must satisfy -Efpfe] < Np^, this bound holds 
for both types of constraints. As in the case of the block fading model [21], we will show that 
for large enough effective coherence times and SNR this bound is tight, and the channel capacity 
does not degrade due to the lack of channel knowledge. 



B. Low SNR upper bound 

The channel capacity is upper bounded by the following lemma: 

Lemma La: The capacity of a channel with a peak power constraint is upper bounded by: 



C < lim < p^ 



fc— ^-oo 



1 + N 



E 



^0,0 



— log I 1 + A^V min inf {Sk)m,m ] } , (52) 



and the capacity of a channel with a quadratic power constraint is upper bounded by: 



C < lim < Np^ 



E 



^0,0 



+ \/aPa 



\ 



sup E 

Pr{Pfc-i) 



1-A^min inf {Sk)m,m 



-aNpl } , (53) 



where T(Vk) = {Xq : X|Xj = p^, i = 0, . . . , /c} is the set of all input symbols that correspond 
to the power matrix Vk- 
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Proof of Lemma La: We start with the inequality: 



lim ^^, 

n^oo J\l\n + 1 



- J(X- Yo") 



Um - 

n^co JS [n 



' fc=0 



1 " 



fc=0 









(54) 



where h{-) is the differential entropy. The first term in the last line of (54) can be upper bounded 
by the maximal entropy of a random vector with a given covariance matrix. The resulting entropy 
is [31]: 

h{Y^) < log (IICyJI) + iVlog(7re) < Tr {e[Y^yI] - l) + iVlog(7re). (55) 

where the last inequality uses log (||l + A||) < Tr(A) which holds for any positive semi-definite 
matrix A.^ Note that this inequality is tight for low SNR if the transmitted signal has zero mean. 
Substituting the covariance matrix: 



E 



YfcYj 



N -E 



diag(X,)H,Hidiag(X,)t 



+ 1, 



(56) 



we get: 



h{Y^) - N log(7re) < iV ■ Tr E [5x J 



C£, +E 



E 



^feXfe 



1 + A^ 



'H 



E 



H. 



E 



H. 



^0,0 



(57) 



where the first line uses the rotation property of the trace and the spectrum definition, (2). The 
second line uses the fact that 5x is a diagonal matrix, while all elements on the diagonal 
of the second term are equal. We also use the normalization TrfCn ] = 1, which results in 

V^'H. )m,m ^ i/iV. 

Turning to the second entropy in the last line of (54), we note that the conditional distribution 
of the output given the input is Gaussian. We also observe that: 



^YJXJ.Yj-i 



iV.diag(X,)Sfcdiag(X,)t + |, 



(58) 



JV-l iV-1 

^Using ||l + A|| = n (1 + -^0' ^nd Tr(A) = ^ \i, where Ao, . . . , Aat-i are the eigenvalues of the matrix A, and the 

1=0 1=0 

inequality log(l + x) < x. 
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where Sk is the conditional channel covariance matrix given the past transmitted and received 
symbols. Using (9), the entropy is written as: 



/.(Y,|X^, Yo^-^) - Anog(7re) = E^^.y^- 



log 



^YJXJf.Yj-i 






E- 






-Cz-v-fc 



log 
log 



> E 



Vk 



inf lo^ 

xgex(Pfe) 



iVdiag(X,)£fediag(X,)t + I 
NSkS^^ + I 



(59) 



where we use the rotation property ||l + AB|| = ||l + BA||. 
Now, we observe that since Sx is diagonal and nonnegative, 

Tr (^£fc5xj > min(£fc)„,„Tr (5xJ • (60) 

Using also the inequality log||l + A|| > log(l + Tr(A)) which holds for any positive semi- 
definite matrix A,^ the term inside the expectation in the last row of (59) is lower bounded 
by: 



inf lo^ 

xgex(Pfc) 



iV£fc5x, + I ) > inf log ( 1 + NTt ( Sfc5x, 

xgex(Pfc) 



> log 1 + A^Pfc min inf {Sk)m,m 



(61) 



Note that this lower bound is achievable using a transmission spectrum that concentrates all of 
the power on the frequency bin that has minimal estimation error. 
Substituting (61) and (57) in (54) results in: 

1 



C < lim _, " V sup \e [pj 



E 



-E 



log I 1 + Npj, min inf 



xS-^ex(Pfc_i) 



1 + N 

[^ k)m,m 



^0,0 



(62) 



Noting also that the resulting quantity is monotonically nondecreasing in k, the capacity is upper 
bounded by: 



1 



N A:^oopj,(p^-) 



C < — Um sup <j E [p^ 
E 



1 + N 



E 



^0,0 



log ( 1 + Np^. min inf {Sk)m,m 
^ ^l-^exiVk-i) 



(63) 



'Using Y.^ log(l + \i) > log(l + J2^ ^i) since Xi > 0. 
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Subject to the peak power constraint, for every k the bound is maximized if p^ = Np^ (as 
p^ appears inside the log in the second (negative) term). Substituting this optimal transmission 
power in (63) results in (52) and proves the first part of the lemma. 

In order to derive the bound for the quadratic power constraint we use the inequality log(l + 
x) > X — ^x^ and simplify (63) to: 



C < — lim sup { E [pj 



1 + A^ 



E 



^0,0 



- E 



Npj^ min inf 



' k)m,m 



xr^ex{Pfe_i) 



A^ p^ < min 



m -v-k-i 



inf ( 

eX(Pfc-i) 



' k)m,m 



(64) 



Using {S 



k)m,m < (Ch )m,m = ^/N, wc rewrite (64) as: 



C<^lim sup {NE[p,] 



E 



^0,0 



+ E 



Pfc 



1 — A^ min 



inf 



^t^eX(Pk-i) 



^^k)m,m I ~r „Pk 



(65) 



and use the Cauchy-Schwartz inequality and the power constraint, (11), to prove (53). ■ 

The exact values of the bounds depend on the conditional channel distribution, through 
^k)m,m- This distribution is analyzed in the next section. 



minminfvfc-i 



xr'G2:(^fc-i)^ 



C. QPSK lower bound 

For the lower bound we need to select an input distribution. The first lower bound derived 
herein is based on QPSK modulation. This bound applies for both power constraints. For the 
quadratic power constraint, the bound is especially significant at the low SNR regime, where 
it is crucial to use a low fourth moment. The QPSK modulation uses constant amplitude and 



hence minimizes the ratio 



1^] 






The input symbol is given by: 



where B, 



X. 



N 
/3— PxgBfc, 



(66) 



b^ Q . . . , b^ ^_^, 0, . . . , , r (1 < r < A^) is the number of active frequency bins, 
{t'fcm} i^ the sequence of iid QPSK data, and b^„j G {±1, ±j} with equal probability. On the 
other hand, g G {0, 1} is a binary random variable, statistically independent from b^ q, . . . b^ ^._^, 
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that determines whether the system will transmit or not (i.e., a single variable that affects all 
transmitted symbols). The transmission probability is Pr(g = 1) = ^. This distribution is 
convenient for analysis, although it is not reasonable for practical systems (one cannot consider 
a practical system that has a positive probability not to transmit any symbol at any time). A 
practical system can achieve the same performance by transmitting in blocks with large gaps 
between the blocks, as long as the block length is significantly larger than the channel coherence 
time. 

Lemma 2. a: The capacity of a channel with a quadratic power constraint is lower bounded 
by: 



r-l 



C > lim max max — - > E 

k^oo r<N l<B<a NB ^-^ 



m,=0 



c, 



QPSK 



fi—Vx\li-k,r. 



1 



(67) 



J- + P ^ Px{^ k)m,m ^ 

where Cqpsk(p), defined in (23), is the achievable mutual information for QPSK transmission 
over AWGN channel with SNR equal to p (see for example [20], which also gives some useful 
bounds). The capacity of a channel with a peak power constraint is lower bounded by: 



r-l 



C > lim max — } E 

h-^r^, r<N AT ^-^ 



m=Q 



a 



N^ I ~ 12 

r Px \H'k,m\ 



QPSK 



1 



(68) 



-L ~r J, Px\^ k)m,m 

Proof of Lemma 2. a: As g is constant throughout the transmission, we can safely assume 
that after long enough time the receiver can decode g with no error. Thus, we can write: 

1 .„-_ ~„. ,. 1 



Hm ■ 

n->oo A/ (n 



-/(X- Yo") = lim ^-^/(X-Yo"| 



n^oo N{n+ I)' 

— lim 

f3 n-foo N{n - 



/ l-^oi -■- I 



1). 



(69) 



Given g = 1 the transmitted symbols are statistically independent. We use the following 
equality: 



/(B-Yo"|g= 1) = 5^/(B,;Y,|g= l,Br\Yo^-\Y^Vi) 

fc=0 
n 

+ 5^/(B,;Yo^-i|g=l,Bo^-\Y,V) 

fc=0 
n 

+ 5^/(B,;Y^^i|g = l,Bn 



(70) 



fc=0 



The second and third terms on the right hand side of this equation correspond to the mutual 
information between an input symbol and the past or future output symbols. As these terms do 
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not use the k-th symbol output, they both vanish when the input symbols are iid. The first term 
can be lower bounded by: 

n 

J(B- Yo"|g = 1) = Y1 ^(B.5 Y,|g = 1, B^i, Yt\ Y,V) 

k=0 

n 

A;=0 
n r— 1 

^ E E ^K„; y.Jg, = 1, Bo^-\ Y^i), (71) 

fc=0 m=0 

where the second line uses the statistical independence of Y^^^^ and B^, and the third line uses 

the statistical independence between the elements of B^. Substituting (71) in (69), and noting 

again that the term inside the summation is monotonic in k, results in the lower bound on the 

capacity: 

1 ""^ 
^ ^ l'"^ Wr E ^(b.,„; yfc,„|Br\ Y','\ g = 1). (72) 

fe— >oo iV P ^—^ 
^ m=0 

Note that the past symbols affect the distribution of the current symbol only through the condi- 
tional distribution of the channel. Thus, in this lower bound, the decoding of correlated output 
symbols is replaced by the decoding of uncorrelated symbols with channel state information 
(CSI). This CSI is produced by optimal channel estimation based on past input and output 
symbols. A coding scheme that allows to decode the previous symbols and use them for channel 
estimation for the next symbol is shown in [25], [26]. 

Focusing on the m-th frequency bin of the A;-th symbol, given g = 1, the channel output is: 



yk,m = y -PxNpik,m\m + ^k,m + ^k,m- (^3) 

where w^ ^ is the m-th element of the DFT of the Complex Gaussian noise, and v^ ^ = 
-PxN{h.^.^ — ilk,m)^km i^ the interference term due to channel estimation error. Since the 
estimation error is a circularly symmetric Gaussian random variable and the data has constant 
amplitude, the interference term is also Gaussian and is statistically independent from the data 
(b^ ^). Assuming optimal channel estimation, the resulting channel is equivalent to a coherent 
flat fading channel with additive independent complex Gaussian noise samples of variance 1 + 
f3—Px{^k)m,m- Evaluating the mutual information of this channel for QPSK modulation and 
multiplying it by the transmission probability (1//3) leads to (67) and (68), depending on the 
allowed range of the parameter (3. ■ 
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D. Truncated Gaussian lower bound 

For high SNR, the QPSK bound will not be tight (most obviously as it is limited to 2 bits 
per frequency bin). The most intuitive alternative is a complex Gaussian input distribution. Such 
a distribution seems very close to optimal (based on a large set of simulations). However, we 
were unable to produce an appropriate bound on the resulting conditional channel variance, and 
therefore turned to a slightly modified distribution. 

Let u be a proper complex Gaussian distribution, u ~ CA/'(0, 1). We denote hereafter by 
truncated Gaussian distribution with parameters r] and (^ the distribution of u given : ?] < |up < 
^. In mathematical terms, we say that z ~ TCJ\f{r],^) if for any function /(■) we have: 



E[f{z)]=E /(u 



V<H <^ 



Note that |up has an exponential distribution with parameter 1. Define: 

e = Fr{r] < |u|2 < = e"'' - e~«. 
The second moment of the truncated Gaussian distribution is: 



Pz 



« 



Xe ^dx = l + 



Tje 



-V _ f P-? 



ee- 



e-*? 



=-? 



The Fourth moment of the truncated Gaussian distribution is: 



E M] 



? 



X^e-^dx = 2 + 



[r]' + 2r^)e-^ - {e + S^e-^ 



■q 



Using the truncated Gaussian distribution, we define the input symbol as: 

c 



X, 



/Pz 



:gZ„ 



(74) 



(75) 



(76) 



(77) 



(78) 



where c is chosen to satisfy the power constraint, g is defined as in the QPSK lower bound, Z^. = 
[zf,Q. . . ,Zj^ ^_p 0, . . . , O] , 1 < r < A^ is the number of active frequency bins, and {z^ ^} is the 
sequence of iid input random variables with truncated Gaussian distribution z^^ ^ ~ TCJ\f{r], ^). 
Lemma 3. a: If -E[ho „»] = 0' the capacity of a channel with a quadratic power constraint (and 
q; > 1) is lower bounded by: 

C > lim max sup max ~irro 

fc->oo r<N ^>o i<«<^lii±!Z)i. Np 



^ m=0 



^^pll.-E {£, 



u 



1 + (3^p^E 



' k)m;rn 



(79) 
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' kirri'iri 



1 



I |2 
U 



and the capacity of a channel with a peak power constraint is lower bounded by: 

1 '-' I ^^U-E 
C > lim max sup — } log I 1 -\ -^ 

+ ^\og{e'^-e~^). (80) 

where u is a proper complex Gaussian random variable (u ~ CJ\f{0, 1)) and r] and ^ are the 
thresholds used to define the truncated Gaussian distribution. 

As in the previous bounds, this lower bound depends on the conditional channel distribution 
(through the expectation of the conditional channel covariance matrix). Note that the co variance 
matrix itself also depends on the definition of the truncated Gaussian distribution, i.e., on the 
parameters i] and ^. 

Proof of Lemma 3. a: See Appendix A. 

VII. Channel estimation 

In this section we analyze the conditional distribution of the channel given the past transmitted 
and received symbols, and provide the bounds required for the capacity evaluation. 

Using Assumption III (no feedback) the transmitted signal is statistically independent from 
the channel, and hence, given the input symbols Xq^\ the output symbols Yq^^ are jointly 
Gaussian with the channel Hq. Thus, the desired distribution can be analyzed through the theory 
of optimal linear estimation. The auto-covariance of the conditioned output symbols, Yq^^IXq^^ 
is: 

Cy^,_1|^,_i = Ndiag{'X.^o~^)C^k-idia.g{'X^-y + I (81) 

and their cross covariance matrix with H^ is: 

CYr\HjxS- = v^ • diag(Xo^-^)CH.-i,H^ (82) 

where Cxifc-i i, is the sub-matrix of Cxrfci describing the cross covariance between current 
channel vector and the past channel vectors. Therefore, H^|Xq^\ Yq^^ has complex Gaussian 
distribution H^.|Xq^\Yq^^ ~ CJ\f{flk,^k), where the conditional channel mean, p,^, and the 
conditional channel covariance matrix, Sk, are given by [32]: 

Afc = E[U,] + ^C^^k-.^^dia.giX',-y (iVdiag(Xo^-i)CH.-xdiag(Xo^-i)t + T 

Y^l - v^diag(Xo'-l)i?[H^l]) (83) 
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•diag(Xo^-i)CH.-i,H,- (84) 

We note that the conditional channel mean is characterized by the conditional channel co- 
variance matrix, Sk, since we can write the distribution of the channel mean as /x^IXq^^ ~ 
CAf f -E[H^], Cjj — Sk ) • Therefore, we will focus hereon on the characterization of the condi- 
tional channel co variance matrix Sk- 
in this section we characterize the bounds on {Sk)m,m and its moments in terms of the 
effective coherence time. 

Lemma Lb: With a peak power constraint, the conditional channel co variance matrix is lower 
bounded by: 

lim min inf {Sk)m,m > ^rr „ , ^ , — ^ -• (85) 

fe^oo m x^iexcjvpj) iVi + l^(T,(p,)-iV) 

With a quadratic power constraint, the conditional channel covariance matrix satisfies: 



lim sup E 



A^ min inf 




< a 1 



1 + s^(T,(p,) - N) 
Proof of Lemma Lb: In Appendix B we prove the following inequality: 




{^ k)Tn,m I— j,j- 



-1 



l-DlVk-ii\ + Ak-iVk-i) 'Dk 



(87) 



-1 

"■0 



The proof is based on showing that the conditional covariance is minimized using S^k- 
Vk-i ® Jm, where J^ is an A^ x A^ matrix with only one nonzero element, the m-th element on 
the diagonal, which equals 1. 

The first part of the lemma follows directly from (87) by noting that the conditional covariance 
matrix is a nonincreasing function of each element in Vk-i- Therefore, it is minimized when all 
elements take their maximal value {Vk-i = Np^V), so that 

1 



lim min inf {Sk)m,m > ,, 

fc^oo m xS-leX(AfpJ) A 



1 - lim Np^Dl (I + Np^Ak-i)-' Dk 



(88) 



Using the definition of the effective coherence time, (12), and the matrix inversion lemma leads 
to (85) and completes the first part of the proof. 

For the quadratic case the bounding is slightly more complicated, and we search for bounds 
on moments of the conditional covariance matrix that will hold for any distribution of the 
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transmission powers, Vk-i- To do so, we use the same derivation as in (B.4), and apply it to 
(87) with Vk~i = E[Pk-i] + V. We get: 



{SkU^ > ^ - ^DlE[Vk^,] (I + Ak.iE[Vk-i])-' Dk 

-^Dl (I + E[Vk-i]Ak-i)-' Vi\ + Ak^^E[Vk-i]r' Dk. 
Taking the expectation of the square of (89) and using {Sk)m,m < jr, we have: 



E 



N{e 



k)m,m 



< [DlE[Vk-i]{\ + Ak^,E[Vk-i]y' D 



E 



V^VVV^VV 



(89) 



(90) 



v-l 



where V = (\ + Ak-iE\Pk-i\) D^- Focusing on the second term in the righthand side of (90) 
we have: 



E 



V^VVV^VV] = E 



= EE 

a b 



$Z 5Z ^*a'^a,aVaVlVb,bVb 
Va,aAb 



I |2 I 1 2 771 

\Va\ \Vb\ E 



I |2 I |2 / 7-1 
\Va\ \Vb\ \IE 



1^2 

' a,a 



E 



-p2 

' b,b 



< {a - i)NY^{y^vf 



(91) 



where we use Vb,b = P^ — -^[Pfe]' ^iid hence E 



V, 



b,b 



<{a- 1)N^pI. Substituting back in (90), 



E 



i-N{e 



k)m,m 



we observe that the righthand side of (90) is maximized with E[Vk~i] = Np,J and we get: 

'^ < NY, (4 (I + Np,Ak-i)-' Dkf 

+ {a- 1)N'pI (4 (I + Np,A,,^r' D 
< NY^ (4 (I + Np^Ak-i)-' D^y 

+ {a- 1)N^pI [dI (I + iVp,A,_i)-^ D 
= aNY (4 (I + Np^Ak^,)-' D^y (92) 

where the second inequality in (92) uses the fact that all eigenvalues of \ + Np^ A^-i are larger or 
equal to 1, and therefore (I + Np^Ak-iY ^ (' + Np^A^^iY ■ Using the effective coherence 
time definition results in (86) and completes the proof of the lemma. ■ 
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Lemma 2.b: Using QPSK modulation, as defined in (66), for each active frequency bin (0 < 
m < r — 1) the diagonal element of the conditional channel covariance matrix is lower bounded 
by: 



lim (S 



7Vi+lEl(f,(/3p,)-iV) 



r = N 



fc— >oo 



.)™,™ < <! --T^y^^^^p^ (93) 

Proof of Lemma 2.b: Starting with the wideband QPSK modulation (r = A^), the trans- 
mission spectrum given g = 1 is equal to Sy^k-i = (3px\. Substituting in the estimation error 
covariance matrix, (84) (see also (B.3)), we have: 






(94) 



Writing the estimation error covariance matrix, (94), in the time domain, and substituting (7) 
we have: 

€k = FSfcPt = Ch, - N/3p^Dl ® Ch, {N/3p^Ak^, ® Ch, + I)"' D^ 8) Ch,. (95) 

Note that covariance matrices are hermitian so that CL = Ch • In Appendix C we show that 

k ^ 

this is a diagonal matrix in which the /-th element on the diagonal is given by: 

{Skh = {Cnji,i - Nf5p,Dl{Q^Xi [N [5p.,Ak-i{^^,)i,i + I)"' A(CH,)^,^ (96) 

For any channel tap such that (Ch )i,i > we can use the matrix inversion lemma to write the 
inverse of the estimation error as: 



and taking the limit as k goes to infinity: 



Ak-i - DkDl 



I) 'Dj, 



(97) 



^lim (S,);/ = (Ch,);/ + ^ (Te (/3p.(Ch,)m) - N^ 



(98) 



Going back to the frequency domain, the properties of the DFT guarantee that all elements 
on the diagonal of Sk are equal. We therefore have: 

IV <! Mm fc,, !> < — N 

N 



lim (S 



fc— >oo 



k)m,m 



i:ic::y,,^o{Cuxt+p'f(Tc{Pp.)-N 



■ (99) 
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Using the concavity of the function (a + 1/x)"^ for positive a and TrfCn ] = 1 proves the first 
(upper) part of (93). 

For r < A^ we lower bound the diagonal element of the conditional channel covariance matrix 
by the performance of a sub-optimal estimation scheme. This estimation scheme estimates the 
channel at the m-th frequency bin based only on data transmitted and received over the m 
frequency bin. Inspecting (94) and taking only the elements that correspond to the m frequency 
bin (0 < m < r — 1) the conditional channel variance is lower bounded by: 



-1 



1 IN \ I N 
IS M r \ r 



(100) 



where we took into account the normalization (Cjj )m,m = j^ and the fact that the transmitted 
power for an active bin, given g = 1, is (5—px- Using the matrix inversion lemma and the 
definition of the effective coherence time, (12), results in the second part of (93) and completes 
the proof of the lemma. ■ 

Lemma 3.b: Using truncated Gaussian input distribution, as defined in (78), for each active 
frequency bin (0<m<r — 1) the mean of the diagonal element of the conditional channel 
covariance matrix is lower bounded for quadratic power constraint by: 



lim E 

k—^oo 



' k)m,m 



N 



Ppx 



< < 



"XT 2L 



fiyx 



^N 



N 



liyx 



-L\ 2r i^-: 



/3pa: 



-N 



and for peak power constraint by: 



lim E 



' k)m,m 



N 



< < 



Px 1-eV i 1 I rp 
' « log{l + i)2i' ^'^ 



Vx_ l-e'5-« 



-N 



N 



l-eV-i 



« log(l + i) 2'- 



eV-i 



"■« log(l + i) 



-N 



r = N 
r < N 

r = N 
r < N 



(101) 



(102) 



Proof of Lemma 3.b: Starting from the wideband truncated Gaussian distribution (r = 
A^), the power transmitted in all frequency bins is strictly positive. We therefore can write the 
conditional channel covariance matrix (84) (see also (B.3)), as: 

1 . . \ „ „+ /„ 1 



^'^ - ^1 [N^y^r') - ^H, - Ch.-.h, ( ^hS- + ]^ 



:S. 



-1 



^— Trfc lu • 



(103) 



Next we show that /i (A) is a concave function of each of the elements in the strictly positive 
diagonal matrix A. Denote by /« the vector which is all zeros except for a 1 in the i-th element. 
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The derivative of /i(A) with respect to (A)j j is given by: 

5/1 (A) , 



5(A), 



•^k-H. l^H,';- 



A 



hlj { Ch„-i + A 



•^Hr^H, 



(104) 



which is a positive semi-definite matrix. The second derivative is given by: 

d{A) 



2 

i,i 



-2ChS-h^ (Ch.-. 



A 



/,/J Ch.-1+A /,/J Ch.-1+A Ch.-ih(105) 



which is a negative semi-definite matrix, and guaranties that the function /i(A) is a concave 
function of (A)j j. 

In order to avoid the differentiation with respect to the whole matrix A we considered only 
the derivatives with respect to each single element in it. But, as we showed that the function 
/i(A) is a concave function of each of the (diagonal) elements in A regardless of the others 
elements, we can now apply the Jensen's inequality serially on one matrix element at a time. 
Going over all matrix elements we get: 



E 



E 



^1 ( N^x^r' 



1 



s-l 



Ch.-ih • (106) 



Taking into account the definition of the transmitted signal, (78), the expectation of the inverse 
of the transmitted spectrum, given g = 1, is: 



E 



. ^0 



I Ell. 



X e ^dx<\ 



Pz Ei{r]) 

^2 a — Tj g- 



< I- 



P: 



e-Moffd 



(107) 



c^e'' — e'» c^ e '^ — e ^ 

where Ei{) is the exponential integral function, and the second inequality used Ei{a) < e~" log(l- 

i) [33]. 

Substituting (107) in (106) results in: 



E 



Sk 



(108) 



where for quadratic power constraint, substituting (76), (A.21), r = N and ,^ = 00 we have: 

1 



^ ^r^^(l+r^)log(l + i)- 



TfJ 



For the peak power constraint, substituting (A. 25) and (3 



ri e-''log(l + i) 



(109) 



(110) 
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Comparing (108) with (94) we note that the only difference is the change of the factor f3px 
with the factor ^. Hence, repeating the derivation of Lemma 2.b (Equations (95) to (99)) with 
the replaced factor results in: 

Wm. E {£k)m,m < ITT "^7 T' (HI) 

Substituting the proper values of ( in (111) lead to the upper parts of (101) and (102) and 
completes the first part of the lemma. 

Next for r < A^. As in the QPSK case, we turn to the suboptimal single frequency bin 
estimation. In this case, the estimation error in the m-th frequency bin (0 < m < r — 1) is given 
by: 

:.) <^-^4(^A,_, + l^J 'd, (112) 



m,m 



where the diagonal matrix Am is generated by taking from the matrix S k_i only the terms 
corresponding to the m-th element of each OFDM symbol. 

Comparing (112) with (103), they have exactly the same structure, and hence we can repeat 
the first part of the proof (Equations (104) to (108)), adjusting only the corresponding matrices. 
The resulting bound is: 



E 



Sk] < ^-^cDiic^k-1 + \y'Dk. (113) 



with the values of ( given by (109) and (110). 

Using the matrix inversion lemma {A + UCV)~^ = A~^ - A~^U {C-^ + VA'^U)'^ VA'^ 
and the definition of the effective coherence time, (12), results in the lower parts of (101) and 
(102) and completes the proof of the lemma. ■ 

VIII. Conclusions 

In this paper we derived bounds on the capacity of the noncoherent stationary underspread 
complex Gaussian OFDM-WSSUS channel with a peak power constraint or a quadratic power 
constraint. The bounds are characterized only by the system signal-to-noise ratio (SNR) px and 
by a newly defined effective coherence time Tc{px), which measures the capability to estimate 
the channel and is a nonincreasing function of the system SNR. The bounds show that: 

• The coherent channel capacity is achievable li pxTc{px) ^ 1 and Tc{px) ^ N. 
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• For the peak power constraint, if the channel has zero mean and p^ ^ ^/Tc{Px) the capacity 
is approximately ^plTc{px)- 

• For the quadratic power constraint if the channel has zero mean and p^ <^ 1/Tc{px) the 
capacity ranges between ^plTc{px) and ^plTc{px) (typically the effective coherence time 
changes slowly with SNR and the higher bound better characterizes the capacity). If the 
limit Tc„ = \irap^^QTc{px) exists, the capacity in the very low SNR limit is ^plTc{px)- 

• As long as the effective coherence time is large enough {Tc{px) ^ N) the channel capacity 
is achievable using a receiver that performs channel estimation based on past received and 
transmitted symbols, and decodes based on minimum Euclidean distance. 

The paper presented an initial study of the effective coherence time and plotted it for the auto- 
regressive ARl channel. Owing to its relevance in the characterization of the channel capacity, 
it is important to better understand its properties. Future work is required to characterize the 
effective coherence time of commonly used channels, and to study the relationship between the 
effective coherence time and channel capacity in more complicated channel structures. 

Appendix A 
Proof of Lemma 3. a 

As stated in the proof of Lemma 2. a, we can safely assume that g can be decoded with 
no error. We reuse the derivation of Lemma 2. a up to Equation (72), and focus on the mutual 
information of a single frequency bin: 

1 '"^ 

^ ^ J™ 77r E ^^^^,rn; ykJ^O-\ ^0-\ g = !)• 
fc-5>oo I\ p ^ — ^ 
^ m=0 

In order to lower bound the mutual information we use the generalized mutual information 
(GMI, see for example [24]). The derivation of the GMI can give a lower bound on the mutual 
information evaluated by: 

g-sd(z,y) 



ii^;y)>E 



log 



(A.l) 



E[e'''i('''y)\y]_ 

where s is any positive constant, d{z,y) is any distance metric that defines the operation of the 
receiver (which chooses the codeword that minimizes the distance to the received signal), and 
z' is a random variable with the same distribution of z but statistically independent of z and y. 
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We next evaluate such lower bound based on the transmission of a truncated Gaussian dis- 
tributed symbol over a flat fading Gaussian channel. Consider the channel: 



y = hz + V 



(A.2) 



where z ~ TCJ\f{r], ^) is the input symbol with truncated Gaussian distribution, h is the Gaussian 
channel with h ~ CJ\f{fi, e) and v is an additive white proper complex Gaussian noise with 
V ~ CJ\f{0, 1). We assume a conventional coherent receiver, i.e., the distance metric is: 



d{z,y) = \y- fizf . 



(A.3) 



The more difficult part in the evaluation of (A.l) is typically the evaluation of the expectation 
in the denominator. Luckily, we can use the fact that the input distribution is close to the Gaussian 
distribution. For a proper complex Gaussian random variable u ~ CJ\f{0, 1) we have: 



E 



-\u—au\ 



.g l+|ap ^ 



(A.4) 



l + |a|2 

As the expression inside the expectation is always positive, we can lower bound this expectation 
using the truncated Gaussian distribution by: 



1 



e TT^ > OE 



^-\u-au\' 



V<H <^ 



(A.5) 



l + |a|2 

where 9 = e~'^ — e~^ is the probability that i] < |up < ^. Substituting u = y/sy and a = ^/sfi, 
we have for a truncated Gaussian distributed z': 



E 



-sd{z',y) 



< 



e ^ — e 5 1 + s|/iP 
Substituting (A.6) in (A.l), the resulting bound is: 



g l + s\fi\- 



(A.6) 



J(z; y) > log(e-'' - e"^) + log (l + s|/i|2) + E 

= log(e-'' - e'^) + log (1 + s|/ip) - s(l + p,e) + 
We arbitrarily choose s to zero the last 2 terms in (A. 7), i.e.: 

Pz 



I |2 , ^y 



1 + s|/iP. 



l+Pze'' 



which results in: 



J(z;y)>log(e^''-e-^) + log 1 + 



1 +Pz^ 



(A.7) 



(A.8) 



(A.9) 
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To use this bound for the actual channel at hand and the modulation described by (78), we 



substitute n = c^k,m\/N/pz and e = Nc^{Sk)m,m/Pz, which results in the bound: 



C> lim 



1 '■"^ 
-Ye 



m=0 



log 1 



iVc^l/i,, 



+ — Me-^-e-?). (A.IO) 



1 ^Nc^{Ek)ra,ra, 

Next, we show that (A.IO) is convex with respect to {Sk)m,m, and hence can be lower bounded 
using the expectation of {Sk)m,m- To do so we use the distribution of fik,m, (49). At this point 
we also use i?[h^„J = 0, as was stated in the body of the lemma. We introduce a new proper 
complex Gaussian random variable, u ~ CJ\f{0, 1), statistically independent of {Sk)m,m, and 



note that the distribution of fJ,k,m is identical to the distribution of u\/-^ — {Sk)m,m- Using the 



random variable u we can rewrite (A.IO) as: 



r-l 



C> lim — - y E 

- fc^oo N/3 ^-^ 



m=0 



log 1 



I jY (,'— 'A;jm,m 



U 



iV/3 



log(e'^' - e-«) 



lim > 



r-l 



+ 



iV/3 



log(e" 



E 



E 



log 1 



NcH^ 



m=0 



' k)m,m I U 



' k )m,m 



1 



(A.ll) 



1 + Nc^{£k)m,m 

where in the second equation we used the law of total expectations, and the inner expectation 
is taken only with respect to u. 
We define the function: 



fie) = E 



log 



^ I Ncm,-e)\n 



and rewrite (A.ll) as: 



1 



r-l 



C> lim Y Elf ((£ 

m=0 



k)m,m 



Nc'^t 



+ 



(A.12) 



N/3 



log(e 



-r; 



=-?! 



(A. 13) 



We next show that /(e) is convex for < e < -^ by proving that its second derivative with 
respect to e is nonnegative. The first derivative is given by: 



dm 

de 



E 



N&E 



-iVc2|u|2 (1 + Nc\) - Nc^ [Nc^ (^ - e) |u|2)' 
(1 + Nc^e + Nc^ (^ - e) |u|2) (1 + Nc^e) 

-d + aiup 



(1 + Nc^e + iVc2 (i - e) |u|2) (1 + Nc^e) 



(A. 14) 



July 5, 2011 



DRAFT 



SUBMITTED TO THE IEEE TRANSACTIONS ON INFORMATION THEORY 



44 



The second derivative is: 



de^ 



N'c'E 



:i + c2)iup(i 



|uP)(l + iVc2e) 



+N^c'E 



_ (1 + Nc^e + Nc^ (^ - e) |u|2)' (1 + Nc^e) 
1 + c2)|u|2 (1 + Nc^e + Nc^ (i - e) |u|2) 



(A.15) 



(1 + Nc^t + Arc2 (^ - e) |u|2)" (1 + Nc^t) 
Inspecting the expectation in the second line of (A.15), the term inside the expectation is 
always positive. Using e < -^^ we can lower bound this expectation by removing the term 
Nc^ (-i: — e) |up from the numerator. The resulting expectation is quite similar to the expectation 
in the first line of (A.15). Combining the two expectation results in: 



d'f{e 



> N'c'E 



:i + c2)|u|2(2-|u| 



(A.16) 



(1 + Nc^e + Nc^ (^ - e) |u|2)' (1 + Nc^e) 
We next use the inequality (a — x)/{h + x) > (a — x)/{b + a) which holds for a,x> —b. 
Slightly rewriting this inequality we have (2 — x)/(c + dx) > (2 — x)/(c + 2d), and using it in 
(A.16) we have: 



d'fie) 



> 



N^c^n + c2 



-^[|u|2(2-|up)]. (A.17) 



de' (1 + Nc^e + Nc^ (i _ e) . 2)' (l + (3^p,e) 

To complete this part we note that the multiplicative term in (A. 17) is positive, and the expectation 
is easily evaluated using -E[|u|2] = 1 and -Eljul"^] = 2. Thus, the second derivative is always 
nonnegative: 

d'fie) 



de^ 



> 0. 



(A. 18) 



Equation (A. 18) shows that the function /(e) is convex. Using the Jensen's inequality we 
have: 



E 



f (S 



k)rn/rn 



1 



>f[E 



' k)m,m 



(A.19) 



Substituting (A.19) and (A.12) in (A.13) we get: 



r-l 



C> lim V log 1 + 



Nc^ij, 



E 



' k )m,in 



1 



U 



N(3 



m=0 



Nc^E 



+ 



N(3 



log(e-''-e-«).(A.20) 



i^^ k ) m,m 

For the last stage of the proof we need to consider the different power constraints. For the 
quadratic constraint we set ,^ = oo and c in (78) to: 



f^—Px 

r 



(A.21) 
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using (78), (76) and (77) we have: 

E[XlX,] = Np, (A.22) 

. (A.23) 



pr.^t^ A2i _ t^NY^ r{r -!)(!+ y)^ + r(2 + r]^ + 2r]) _ , , 



1 
1 



r(l + riY 



y,2 (^i _|_ j^Y 

Adding to (A. 20) the requirement to satisfy the quadratic power constraint results in (79) and 
proves the first part of the lemma. 

For the peak power constraint we need to satisfy: 

-r^ < Np^ (A.24) 



c^ 



Vz 

We thus set /3 = 1 and 



2 Np.^p^ 



which leads to (80) and completes the proof of the lemma. 



(A.25) 



Appendix B 
Minimization of the conditional covariance: Proof of Equation (87) 

We search for the minimal estimation error in the m-th frequency bin given the transmitted 
power in each symbol. We use the following inequality which holds for the square matrices D, 
E and F if F and D + E are positive semi-definite: 

(D + E) (F(D + E) + I)-' < D (FD + I)"' + (DF + I)"' E (FD + I)"' . (B.l) 

where A > B means A — B is positive semi-definite. To prove the inequality we write: 

(D + E) (F(D + E) + 1)-^ = (D + E) (FD + 1)"^ 

- (D + E) (F(D + E) + 1)-^ FE (FD + 1)"^ 
= D(FD + I)"^ 

+ [I - (D + E)F ((D + E)F + 1)-^] E (FD + 1)"^ 
= D(FD + I)"^ 

+ ((D + E)F + I)-^E(FD + I)-^ 
= D(FD + I)"^ 

+ (DF + I)-^E(FD + I)-^ 

- (DF + I)-' EF ((D + E)F + I)"' E (FD + 1)"' , (B.2) 
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where the first and fourth equalities use the identity (A + B)^^ = A^^ — (A + B)^^BA^^ and its 
transpose respectively, which hold for any nonsingular matrix A, as long as (A + B)^^ exists. 
The second equality uses the identity (I + BA)~^B = B(l + AB)^^, which is obvious if the matrix 
B is nonsingular, but holds for any positive semi-definite matrix B as long as the inverse of 
(I + BA) exists. The inequality in (B.l) results from the fact that the matrix F ((D + E)F + 1)^ 
is positive semi-definite. 

We use the identity B\\ + BAB^'y^B = BtB(l + ABtB)"\ which holds as long as the inverse 
exists, and rewrite the estimation error covariance (84) as: 

^k= Ch^ - ^Ct,_,^^5xj;-i (iVCH.-i 5x^.-1 + Ij Ch.-ih^. (B.3) 

Setting Sy^k-i = Jm + Jm where Jm and J^ are both diagonal, and using Inequality (B.l) 
results in: 

^fc = Cjj — A^C-fc_i - {Jm + Jm) ( ^Cjjfe-i(j7'm, + J^m) + I ) Cjjfe-ijj 

- iVC^,_ (iV J-„Ch.-i + l) ^ Jm (NC^,-.Jrn + l) "' C^.-iH (B.4) 

"0 "-k \ 0/ \0 /Ok 

Without loss of generality we will assume in this appendix that m = 0. Now assume that 
Jo = Vk-i ® Jo and Jq = diag([l, 0, . . . , 0]). Also recall that Cjjfe-i = Afe_i (g) Cjj . Using the 
identity (A (g) B) ■ (C §>> D) = AC (g) BD, the matrix inverse in the second line of (B.4) can be 
written as: 

(^NC^k-iJo + \y' = (iV(Afe_in_i) (g) (Ch,^Jo) + |)~'. (B.5) 

Noting that (Cjj Jq)^ = (Cjj )o,o(Ch Jo), we use the identity: 

(I + A®B)-' = I- ((I + 0A)-'A)®B (B.6) 

which holds if the inverse exists and B^ = 0B, and can be easily verified by direct multiplication. 
Using Identity (B.6), the inverse in (B.5) can be written as: 

^NC^k-iJo + \y' = \-!^n(\ + NAk-iVk-iiC^JofiY' Ak-iVk-i\ ® (Ch^Jo) • (B.7) 
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8) 



Substituting also Cjjfcijj = D^ (g) Cjj , we can write the multiplication: 

= Dfc ® Ch^ - iV Hi + iVA,_in.-i(CH„)o,o) ' Ak-iVk-iDA ® (Ch^^JoChJ . (B. 

As we test the estimation error in the first frequency bin, we need to evaluate the first column 
of the matrix (NC^k-iJo + I ) Cjjfeijj . This column is given by: 

VaN+b 

= {Dk)a{C^Jb,o - N ((\ + NAk.iVk-i{C^Jo,o) ^ Ak^iVk^iOA (Ch,>,o(Ch>,o 
= ^(«)(Ch>,o (B.9) 

Now the top left element of the matrix described by the last line of (B.4) can be written as: 

fc-l N-l 
~^ 2^ Z-^ kaAf+fe| {JQ)aN+b,aN+h 



a=0 6=0 



fe-1 



N-l 



-^Ei^(«)i'E k^H^ko {Ji 



a=0 
fc-l 



6=0 



-ivEi^(«)i' 



a=0 



iO)aN+b,aN+b 



N~l 



-Ho ; 0,0 



{jQ)aN,aN + ^^ (Cho)^ {Jl 



6=1 



'0)aN+b,aN+b 



> 



(B.IO) 



where we use the fact that ^x*^"^ = Jo + Jq and Tr(5x„) < p^, and therefore {jQ)aN,aN < 
^^'^ ^b=o {Jo)aN+b,aN+b = 0. Wc also usc the positive semi-definiteness of the matrix Cjj so 
that|(CH^;o,oP>|(CH,>,op. 

We conclude that the bound on the estimation error for the first frequency bin is achieved 
when Jq = 0. Using JqCjj Jq = (Cjj )o,oJo and (B.7) we evaluate the matrix: 



Jo ■ (atCh^-i Jo + l) = Vk-i Jo 

- IVk-iN (\ + NAk-iVk-iiCj^Jofi) ' Afc_in_i(CHjo,o| ® Jc 

= iVk-i{\ + NAk-iVk-i{C^Jofi) 'l®Jo- (B.ll) 
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Substituting (B.ll) and J'q = in (B.3), the bound is given by: 

(^.)o,o > (ChJo,o - iV (^(4 ® ChJ Jo [NC^.-rS^.-. + l)"' (d, ® Cgj) 

= i^uJo,o-NDlVk-i (l + iVA,_in_i(CH^)o,o)"' A- (Ch^JoCh^^ 

= (Ch„)o,o - N{C^Jl,DlV,., (l + iVA,_in-i(CH„)o,o)"' D,. (B.12) 

Using our normalization (Cjj )o,o = 1/^, and we get (87). 

Appendix C 
Derivation of Equation (96) 

In this appendix we present the derivation of Equation (96) from Equation (95). The easiest 
way to do so is to define a matrix modulo concatenation, and use its properties to simplify the 
derivation. Note that the matrix modulo concatenation can be seen as a permutation of a block 
diagonal matrix concatenation. Therefore the two concatenations share the same properties. Yet, 
for the problems at hand, the modulo concatenation is more convenient as the matrices we handle 
have the desired form. 

For the matrices of identical size Aq, Ai, . . ., A„_i we define the modulo concatenation as 
A = \Ao, Ai, . . ., Aj,_i\ so that its elements satisfy: 

{0 k mod V ^ / mod v 

(C.l) 
(Afc mod f)[|j,[ij k mod V = I mod v 

i.e., the elements of the concatenated matrices are placed on the diagonals of the blocks of the 

matrix A. If the size of the matrices is r x N then the size of their diagonal concatenation is 

vr X vN. 

The modulo concatenation has the following useful properties: 

1) If A = \Ao, Ai, . . ., A^_i\ and B = \Bo, Bi, . . ., Bj,_i\ are the modulo concatenation of 
r X N matrices, then A + B = \Ao + Bq, Ai + Bi, . . ., A^_i + B^_i\. 

2) If A = \Ao, Ai, . . ., Aj,_i\ is the modulo concatenation of r x A^ matrices, and B = 
\Bo, Bi, . . ., B„_i\ is the modulo concatenation of N x U matrices, then AB = 
\AoBo, AiBi, . . ., A^,_iB^,_i\. 

3) l = \l,l,...,l\. 

4) If the inverse exists then A"^ = \Ao \ A^^, . . ., A~^{\. 
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5) If D = diag([(io, di, . . . , d^-i]) is a diagonal matrix, then A^D = \AdQ, Adi, . . ., Ad^^i\. 
Note that properties 1,3 and 5 follow directly from the definition of modulo concatenation, and 
property 4 follows from 2 and 3. Property 2 can be proved by permutating the matrices into 
block diagonal matrices. But, for the completeness of the treatment we give here a short direct 
proof: 

The k, I element of the multiplication in property 2 is: 

Nv-l 



i^^)k,l = Z_^ ^k,w^w,l 



w=0 

N-l 

/ ^ '^k,wv+{k mod v)'^wv+{k mod v),l V~-'"^) 

which is if /c mod v ^ I mod v, if k mod v = I mod v the k, I element is: 

v-l 

y )k I ^ / J y k mod v) \h\ w V^k mod v)yj I i I ^ yf^k mod v'^k mod v)\k\ \L\ (^--j) 

«i=0 

which completes the proof. 

Using the properties of the modulo concatenation and noting that the matrix Ch is diagonal, 
we can now rewrite Equation (95) as: 

^k = Ch, - N/3p^Dl ® Ch, {N/3p^Ak^i ® Ch, + 1)"^ Dk ® Ch, 
= Ch, - N(3p.\DliCnJo,o, • • • , Dl{CnjN-i,N-i\ 

■\Nf3p,Ak^i{CnJo,o + I, ■ ■ ■ , N/3p,Ak-i{CH,)N-i,N-i + l\~' 

■\-^fc(CH,)o,0) • • • ! -DA:(CH,)Ar-l,Af-l\ 

= Ch, - N(3p,\Dl{CnJofi (iV/3p,Afc_i(CH,)o,o + I)"' A(Ch,)o,o, • • • 

,Di{CujN~i,N-i {Nf3p,Ak-i{Cu^)N-i,N-i + \y' DkiCu^)N-i,N-i\- (C.4) 
which leads to (96). 

Appendix D 
Alternative truncated Gaussian lower bound 

In this appendix we derive an alternative bound that can replace the one in Theorem 3 without 
the need for the zero mean assumption. This bound is given by: 
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The capacity of a channel with a quadratic power constraint is lower bounded by: 



(qd). 



C > sup max max LB^q (j9^,r, /3,?7) — 



r;>0 r<N i<3< <^r(i+r,)2 



N/3 



V 



(D.5) 



and the capacity of a channel with a peak power constraint is lower bounded by: 

C > sup sup max LBJjPq (p^, r, r],^) log(e~'' — e^^) 

r)>0 g>r; '•<^ N 



(D.6) 



where 
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(D.7) 



r < N 



r = N 



(D.8) 



and u ~ CJ\f{0, 1) is a proper complex Gaussian random variable. 



The proof of this bound follows the same lines as the proof of Theorem 3 with one major 
shortcut. We use the fact that the estimation error variance is a decreasing function of each of 
the transmitted symbols' power. We thus upper bound it by the estimation error variance that is 
achieved when all previous symbol used the minimal allowed transmission power. Taking into 

2 

account the transmitted signal structure, (78), the bound results by replacing the term C by ^-^ 
in (108) and (113). 

As this bound does not use the convexity argument, (A. 11) - (A. 18), it does not require the 
zero mean assumption and holds for any expectation of the channel. 
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Fig. 1. Effective coherence time of the ARl channel model for various values of the channel forgetting factor. 
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Fig. 2. Upper and lower capacity bounds vs. SNR for ARl fading channel (Tcq = 900) and quadratic power constraint 
(a = 10). 
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Fig. 3. Upper and lower capacity bounds vs. SNR for ARl fading channel (Teg = 2000, 10000, 50000) and quadratic power 
constraint (a = 2). 
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